Disambiguation Strategies for Cross-Language Information Retrieval
نویسندگان
چکیده
This paper gives an overview of tools and methods for CrossLanguage Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching.
منابع مشابه
Evaluating Wordnets in Cross-language Information Retrieval: the ITEM Search Engine
This paper presents the ITEM multilingual search engine. This search engine performs full lexical processing (morphological analysis, tagging and Word Sense Disambiguation) on documents and queries in order to provide language-neutral indexes for querying and retrieval. The indexing terms are the EuroWordNet/ITEM InterLingual Index records that link wordnets in 10 languages of the European Comm...
متن کاملWord Sense Disambiguation for Cross-Language Information Retrieval
We have developed a word sense disambiguation algorithm, following Cheng and Wilensky (1997), to disambiguate among WordNet synsets. This algorithm is to be used in a cross-language information retrieval system, CINDOR, which indexes queries and documents in a language-neutral concept representation based on WordNet synsets. Our goal is to improve retrieval precision through word sense disambig...
متن کاملEvaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-language Information Retrieval
One of the aims of EuroWordNet (EWN) was to provide a resource for Cross-Language Information Retrieval (CLIR). In this paper we present experiments which test the usefulness of EWN for this purpose via a formal evaluation using the Spanish queries from the TREC6 CLIR test set. All CLIR systems using bilingual dictionaries must find a way of dealing with multiple translations and we employ a WS...
متن کاملUsing Structured Queries for Disambiguation in Cross-Language Information Retrieval
Bilingual transthr dictionaries are an important resource for query translation in cross-language text retrieval. However, term translation is not an isomorphic process, so dictionary-based systems must address the problem of ambiguity in language translation. In this paper, we claim that boolea~l conjunction (the AND operator) provides siml)le and automatic disambiguation in the target languag...
متن کاملCross-Language Retrieval with Wikipedia
We demonstrate a twofold use of Wikipedia for cross-lingual information retrieval. As our main contribution, we exploit Wikipedia hyperlinkage for query term disambiguation. We also use bilingual Wikipedia articles for dictionary extension. Our method is based on translation disambiguation; we combine the Wikipedia based technique with a method based on bigram statistics of pairs formed by tran...
متن کاملTranslation Probabilities in Cross-language Information Retrieval
Translation ambiguity is a major problem in dictionary-based cross-language information retrieval. To attack the problem, indirect disambiguation approaches, which do not explicitly resolve translation ambiguity, rely on query-structuring techniques such as a structured Boolean model and Pirkola’s method. Direct disambiguation approaches try to assign translation probabilities to translation eq...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999